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The organizational development of growing random networks is investigated. These growing 
networks are built by adding nodes successively and linking each to an earlier node of degree k 
with attachment probability A^. When A^ grows slower than linearly with k, the number of nodes 
with k links, Nkit), decays faster than a power law in k, while for At growing faster than linearly 
in k, a single node emerges which connects to nearly all other nodes. When A^ is asymptotically 
linear, Nk{t) ~ tk~'^ , with v dependent on details of the attachment probability, but in the range 
2 < v < oo. The combined age and degree distribution of nodes shows that old nodes typically have 
a large degree. There is also a significant correlation in the degrees of neighboring nodes, so that 
nodes of similar degree are more likely to be connected. The size distributions of the in-components 
and out-components of the network with respect to a given node - namely, its "descendants" and 
"ancestors" - are also determined. The in-component exhibits a robust s~^ power-law tail, where 
s is the component size. The out component has a typical size of order \nt and it provides basic 
insights about the genealogy of the network. 
PACS numbers: 02.50.Cw, 05.40.-a, 05.50.-Hq, 87.18. Sn 



I. INTRODUCTION 

Networks of many interacting units play an important 
role in epidemiology, ecology, gene regulation, neural net- 
works, and many other fields In many studies of 
these networks, the number of nodes is considered to be 
fixed and the presence of a link between two nodes is 
treated as a random event independent of the other links. 
These assumptions lead naturally to random graph mod- 
els [lljs] . While these models have rich behavior and con- 
siderable utility, they are not necessarily appropriate for 
describing growing networks, where the addition of nodes 
and links may depend on the local features of the network 
where the growth event is taking place. 

Typical examples of such growing networks include 
transportation or electrical distribution systems, where 
growth occurs in response to population-driven demands. 
Two currently appealing examples are the distribution 
of scientific citations and the structure of the world-wide 
web. For both these examples there is now considerable 
data available, in spite of the very rapid growth of these 
systems. In the former case, one may consider papers to 
be the nodes of a graph and citations as the links. The 
structure of the resulting "citation graph" was originally 
studied by Lotka in 1926 and then by many others 
|0-|l3| . The basic feature of this citation distribution is 
that it appears to have a relatively steep power-law tail; 
thus most papers are minimally cited while highly-cited 
papers are rare. 

Similarly, in the web graph, much structural data has 
recently been obtained [l4-|2l|] which suggest that the 
number of nodes with k links has a power-law tail, with 
an exponent that is somewhat larger than 2. This power- 
law tail again corresponds to the basic fact that most 
nodes of the web graph are unimportant, while a rela- 
tively small number of nodes garner a large fraction of 
"hits" . Due to the qualitative similarities between the 



citation and web graphs, insights developed in the field 
of bibliometrics have been applied to help understand 
the structure of the web j2^] . 

Because of the dynamic nature of the citation and 
web graphs, it is not surprising that their topologies at 
any fixed time are very different from classical random 
graphs. In distinction to the power-law degree distribu- 
tions of the citation and web graphs, random graphs have 
a Poisson node degree distribution. Here node degree is 
defined as the number of links at a node. To overcome 
the shortcomings of random graphs in describing the 
dynamic natures of these systems, both "small-world" 
networks p^ , |2^ and growing random network models 
p0| , p5| -[2^ have been recently introduced. The former 
are aimed at understanding the relatively small diame- 
ter of large graphs of socially interacting units, while the 
latter seek to understand the growth dynamics. 

In this paper, we provide a comprehensive quantita- 
tive description of a simple growing network (GN) model. 
Our results are based on the analysis of the rate equa- 
tions for the densities of nodes of a given degree. This 
approach bears many similarities to the rate equations 
for the kinetics of aggregation. The rate equations for 
the evolution of growing networks are relatively simple 
and the results that emerge are comprehensive. Thus it 
appears that the rate equation method is better suited for 
probing the structure of growing random networks com- 
pared to the classical approaches for analyzing random 
graphs, such as probabilistic Q or generating function 
Q techniques. The rate equation approach also has the 
advantage that it can be adapted to other evolving graph 
systems, including networks with addition and deletion of 
nodes and links, as well as networks with link re-wiring. 

We will specifically investigate two types of models: 
(a) the GN in which nodes are added one at a time and 
a link is established with a pre-existing node according 
to an attachment probability Ak which depends only on 
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the degree of the target node (Fig. |l|), and (b) the GN 
with re-direction (GNR) , in which the newly-created Unk 
can be re-directed to the "ancestor" node of the original 
target node. An important feature of these models is 
that the links arc directed and the resulting graphs have 
a simple tree-like topology. The motivation for the GNR 
model is that this re-direction process roughly mimics 
how we might (lazily) construct the references to this 
paper. In addition to papers that we peruse and cite 
directly, we are also likely to incorporate some of the ref- 
erences within these papers as part of our reference list. 
A related "copying" process also affects the organization 
of the web (l5). 




FIG. 1. Schematic illustration of the evolution of the grow- 
ing random network. Nodes are added sequentially and a 
single link joins the new node to an earlier node. In this ex- 
ample, node 1 has degree 5, node 2 has degree 3, nodes 4 and 
6 have degree 2, and all the remaining nodes have degree 1. 
Also note that node 1 is the "ancestor" of 6, while 10 is the 
descendant of 6. 

One of our primary results is that for asymptotically 
linear attachment kernels, Ak ^ k as k —> oo, the 
degree distribution of the GN has a power-law form 
Nk{t) ~ tk~'^ , with 1/ tunable in the range 2 < < oo. 
By choosing the control parameters of our model in a 
plausible manner it is then easy to reproduce the quan- 
titative observations about the node degree distribution 
of the web graph. 

In Sec. II, we define the GN and GNR models pre- 
cisely and then determine their node degree distributions 
in Sec. Ill by the rate equation approach. Different dis- 
tributions arise in the GN model which depend on the 
asymptotic behavior of the attachment probability as a 
function of node degree. In Sec. IV, we investigate the 
joint age-degree distribution and find (not surprisingly) 
that "old" nodes are typically more highly connected. In 
Sec. V, we study the correlations which develop between 
the degrees of connected nodes as the network grows. In 
Sec. VI, we study a more global measure of the network, 
namely, the size distributions of the "in-component" and 
"out-component" . With respect to a given node x, the in- 
component is the set of nodes which can reach node x via 
a directed path of links. Conversely, the out-component 
is the set nodes which can be reached from node x via 
a directed path. The former exhibits a robust power-law 
size distribution which appears to be independent of the 



attachment probability. The latter distribution predicts 
a network "diameter" which grows as Int and also pro- 
vides basic insights about the genealogy of the network. 
We conclude in Section VII. 



II. THE MODELS 
A. Growing Network (GN) 

In the GN, we introduce a new node at each time step 
and link it to one of the earlier nodes in the network 
(Fig. |l]). This leads to a network which has a topology 
of a (directed) tree graph. In terms of citations, we may 
interpret the nodes as publications, and the directed link 
from one paper to another as a citation to the earlier 
publication. In terms of the web graph, nodes are web 
pages and the directed links are hyperlinks. We will refer 
to the node to which the link is directed as the ancestor 
of the current node. 

As the network grows, a degree distribution Nk{t), de- 
fined as the average number of nodes with k links (fc — 1 
incoming and 1 outgoing) builds up. The initial node is 
unique as it does not have an outgoing link. The basic 
ingredient which determines the structure of the network 
is the attachment kernel Ak, defined as the probability 
that the newly-introduced node links to an existing node 
which already has k links. On general grounds, this at- 
tachment kernel should be a non-decreasing function of 
k, and natural scenarios are attachment kernels with a 
power law dependence on k. For the linear kernel, the GN 
reduces to the "scale free" model introduced by Barabasi 
and Albert ^ and further investigated in ||2|-|2^. 

The general homogeneous model, Ak = k'^ with 7 > 0, 
was investigated in p8| where it was found that the de- 
gree distribution iV^ (i) crucially depends on the value of 
7. For 7 < 1, the linking probability grows weakly with 
node "popularity" and Nk (t) decreases as a stretched ex- 
ponential in k for any t. The complementary case of 
7 > 1 leads to phenomenon akin to gelation ||2^ in which 
a single "gel" node links to nearly every other node. For 
7 > 2, this phenomenon is so extreme that the number 
of links between other nodes is finite in an infinite graph. 
We shall show that these results also apply for the more 
general situation where Ak ^ k'' as k ^ 00 in addition 
to the strictly homogeneous situation where Ak = k'^ . 

The borderline case of an asymptotically linear attach- 
ment kernel, Ak ^ k, is particularly intriguing as it leads 
to Nk ~ k^'^ , with the exponent v tunable to any value 
larger than 2 depending on finer details of the attachment 
kernel. In particular, the strictly linear kernel, Ak — fc, 
leads to = 3. However, by changing the value of a single 
attachment probability, for example Ai = a and Ak = k 
for k > 2, any value of > 2 is possible. This sensitivity 
of asymptotic behavior on microscopic details indicates 
that the case of attachment index 7 = 1 is marginal. A 
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related phenomenon occurs in constant-kernel aggrega- 
tion, where the asymptotic kinetics is sensitively depen- 
dent on the actual values of the reaction rate . 



B. Growing Network with Re-direction (GNR) 

The GN is built by simultaneous node and link addi- 
tion and disregards other elemental processes which can 
occur in the development of large networks. In the con- 
text of the web, these include node and link deletion (for 
out-of-date websites), link re- wiring, the tendency of a 
new node to connect to nearby nodes, and the copying of 
links from existing nodes to new nodes. The GNR model 
incorporates a simple form of link re-wiring into the GN 
model. At each time step, a new node n is added and an 
earlier node x is selected uniformly as a possible "target" 
for attachment. With probability 1 — r, the link from n 
to X is created; in this case, the evolution is the same 
as in the GN. However, with probability r, the link is 
re-directed to the ancestor node y of node x (Fig. 0). 




FIG. 2. Illustration of the basic processes in the GNR 
model. The new node (solid) selects a target node x. With 
probability 1 — r a link is established to this target node 
(dashed arrow), while with probability r the link is estab- 
lished with the ancestor of x (thick solid arrow). 

A model of this spirit was recently mentioned in the 
context of the web development [|l5|. A related model 
was also proposed long ago by Simon |^,^ to describe 
the word frequencies of English text. The Simon model 
gives a power-law frequency distribution whose exponent 
is tunable in manner which closely mirrors the behav- 
ior in the GNR model. The Simon model was also re- 
cently applied to explain power-law distributions in the 
frequency of family names js^. 

While at first sight the GNR model appears compli- 
cated, we shall see that its characteristics can be obtained 
in a simple fashion. Another very helpful and surpris- 
ing property of the GNR with a uniform initial attach- 
ment probability is that it is equivalent to the GN with 
a shifted linear attachment kernel and no re-direction. 
We shall exploit this equivalence extensively in the fol- 
lowing. Nevertheless, we consider the GNR separately, 
as in many cases the rate equations for the GNR with a 
uniform attachment kernel is simpler to appreciate than 



the rate equations for the GN with a shifted linear at- 
tachment kernel. 



III. THE DEGREE DISTRIBUTION 

A. GN Model 

We now study the evolution of the degree distribution 
of the GN model. The rate equations for Nk{t) are 

^ - A-^ [Ak-iNu-i - AkNk] + 4i. (1) 
at 

The first term on the right-hand side of Eq. (^ accounts 
for the process in which a node with fc — 1 links is con- 
nected to the new node, leading to a gain in the number 
of nodes with k links. This happens with probability 
Ak-i/A, where A{t) — X]j>i ^j-^ji^) appropriate 
normalization factor. A corresponding role is played by 
the second (loss) term on the right-hand side of Eq. (|^). 
Notice that the overall amplitude in Ak is irrelevant, since 
it appears in both the numerator and denominator of 
Eq. (|l|), and can be chosen arbitrarily. The last term on 
the right-hand side of Eq. (|l|) accounts for the continuous 
introduction of new nodes with no incoming links. We 
also set Nq = 0, so that Eq. (|l|) applies for all fc > 1. 

It is worth noting that at a fundamental level, Eqs. (|l|) 
describe the symbolic reaction [k] — > [fc -I- 1] . Many other 
reactions, such as the Becker-Doring theory of nucleation 
psf , additive polymerization |Q, hydrolysis |3^], catal- 
ysis and submonolayer epitaxial growth [ p8| , fit into this 
scheme. However, there is one important difference in 
that we consider strictly a single connected cluster (the 
growing network), while in the context of aggregation- 
like processes, one generally deals with a collection of 
clusters. The effect of having more than one cluster in 
the framework of growing networks is currently under 
investigation | |39[ |. 

We start by solving the equations for the low-order 
moments of the degree distribution, which are defined 
by M„(t) = Ej>ii"^j(0- Summing Eqs. (0) over all 
k gives the rate equation for the total number of nodes. 
Mo = 1, whose solution is Mo(t) = Mo(0) + t. The 
first moment (the total number of link endpoints) obeys 
Ml = 2, which gives Mi{t) = Mi(0) + 2t. The first 
two moments are therefore independent of the attach- 
ment kernel Ak, while higher moments and the degree 
distribution itself do depend on the kernel Ak- 

To develop an appreciation for the types of behavior 
that can occur, consider the linear kernel Ak — k for 
which A{t) coincides with Mi{t). In this case, we can 
solve Eqs. (Q) for an arbitrary initial condition. How- 
ever, since the long-time behavior is most interesting 
we limit ourselves to the asymptotic regime (t —^ oo) 
where the initial condition is irrelevant. Using there- 
fore Ml ~ 2t, we solve the first few of Eqs. (|l|) and ob- 
tain iVi — 2t/3, N2 = i/6, etc., which implies that the 
Nk grow linearly with time. Accordingly, we substitute 
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Nkit) = tuk in Eqs. (P to yield the simple recursion 
relation nk = nk-i{k — l)/(fc + 2). Solving for gives 



nk 



fc(fc + l)(fc 



(2) 



In the context of discrete functions defined on the posi- 
tive integers, this distribution is algebraic over the entire 
range of k. Indeed, as explained in Ref. the proper 
analog of the continuous power-law function f(x) = a;^^ 
is the discrete function fk — T{k)/T(k + A), where F 
is the Euler gamma function. Rewriting Eq. as 
Uk — 4r(fc)/r(fc + 3), we see that Uk is indeed algebraic 
over the entire range fc > 1. 

Returning to more general attachment kernels, let us 
assume that the degree distribution and A{t) both grow 
linearly with time. We anticipate that this hypothesis 
will hold for attachment kernels which do not grow faster 
than linearly with k. By substituting Nkit) = tuk and 
A{t) = fj.t into Eqs. (0) we obtain the recursion relation 
Uk = Hk^iAk-i/in + Ak) and ni = + Ai ) . Solving 
for rife, we obtain 



nk 



— TT 1 + - 

Ak^hK A, 



(3) 



rik^ < 



k exp 



K 5 exp 



k ^ exp 



-2/i Vl 



1 — / 2 1-27 



5 <7< 1, 



(5) 



< 7 < 



etc.. The pattern given in Eq. continues ad infinitum: 
Whenever 7 decreases below with m a positive in- 

teger, an additional term in the exponential arises from 
the now relevant contribution of the next higher-order 
term in the expansion of the product in Eq. (H). 




To complete the solution, we need to find the amplitude 
/i. Combining the definition /i = X)j>i Ajfij and Eq. (|^), 
we obtain the implicit relation 



fc=ii=i 



= 1. 



(4) 



Thus the amplitude /i always depends on the entire at- 
tachment kernel. On the other hand, we shall show that 
the degree distribution exhibits a robust behavior which 
depends only on gross features of the attachment ker- 
nel, as long as Ak grows slower than linearly. The case 
where Ak is asymptotically linear is perhaps the most 
intriguing as the degree distribution has a power-law be- 
havior whose exponent depends on microscopic details 
of the dependence of Ak on k. When Ak grows faster 
than linearly, drastically different gelation-like behavior 
arises. It is again worth noting that these three regimes 
of kinetic behavior also arise in the solutions to the rate 
equations for additive polymerization processes, with the 
different regimes arising when the attachment exponent 
7 is smaller than, larger than, or equal to one pl[ . 
We now separately describe these three cases. 



1. Sub-linear kernels 

Consider sub-linear kernels which are asymptotically 
homogeneous, that is, Ak ^ k'' , with < 7 < 1. Substi- 
tuting this asymptotics into Eq. (|^), writing the product 
as the exponential of a sum, converting the sum to an 
integral, and performing this integral, we obtain 



FIG. 3. The amplitude /i in M-y{t) = yit versus 7. 

To complete the solution, we require the amplitude ^. 
We have been unable to find an explicit expression for 
fi, even if the attachment kernel is strictly homogeneous, 
A-k = k'', as it requires solving Eq. (^). However, this 
relation can be easily evaluated numerically and it shows 
that fJ-ij) varies smoothly between 1 and 2 as 7 increases 
from to 1 (Fig. These two limits correspond to the 
known limiting behaviors for Mq and Mi. 

More detailed results can be obtained for the limiting 
solvable cases of Ak — const, and Ak = k. In these limits, 
/i = 1 and fi = 2, respectively, and the corresponding de- 
gree distributions are given by rik = and by Eq. (j^). 
The former can be easily obtained by following exactly 
the same steps as those used to solve the network with 
the linear kernel. We can then apply perturbation the- 
ory to find the respective limiting behaviors of /i(7) for 
7 close to or 1, 

M = 1 + B07 + O (7') , 

/i = 2~i3i(l-7) + 0((l-7)2), 



with 



00 , 

Em 7 
-r!- = 0.5078. 



00 



2.407. 
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2. Linear kernels 

Consider now asymptotically linear attachment kernels, 
Afc ~ fc as fc ^ oo. As already mentioned, we can always 
choose the amplitude in the asymptotic relation equal to 
one, as attachment kernels which differ by a multiplica- 
tive factor give identical behavior. For the asymptotically 
linear kernel, expanding the product in Eq. and fol- 
lowing step-by-step the approach that led to Eq. (||) now 
gives the power-law asymptotic behavior 



rik 



with V = \ 



(6) 



An important feature of this result is that the exponent 
v can be tuned to any value larger than 2. This lower 
bound immediately follows from the fact that the sum 



^ jrij must converge and this, in turn. 



requires that ly must be larger than 2. 

As an explicit example, consider the attachment kernel 
Ak = k for k > 2, while Ai = a is an arbitrary positive 
number. Now it is convenient to separately treat Ai and 
Ak for fc > 2 in Eq. (|) to recast it as 



fc=2 j=2 



A^ 



(7) 



The right-hand side of Eq. can be simply expressed 
as the ratio of Eulcr gamma functions to yield 



r(l + A:) 



(8) 



fc=2 



\l + li + k) 

This sum can be evaluated by employing the identity | |4C| ] 
r(a + fc) r(a + 2) 



E 

fe=2 



r(6-|-fc) {h- a-l)T{h+iy 



(9) 



so that Eq . (^) re duces to /Lt(/i — 1) = 2q!, with solution 
/i = (1 + -\/T+8a)/2. Thus the exponent v — \ + fi \s 



3 + \/l + 8a 



(10) 



Furthermore, following the steps that lead to Eq. (y), 
the degree distribution for the GN with the attachment 
kernel Ai — a and = fc for fc > 2 is 



ni 



fi + a' 



nk 



iia r(2 + ^)r(fc) 



IX + a r(l + + fc) 



(11) 



Notice that for < a < 1, the exponent lies in the range 
2 < J/ < 3; in particular, v = 2 + 2a — 4a^ -I- ... as a ^ 0. 
When a = 1, we recover the connectively distribution 
of Eq. (||). For a > 1, we have ^ > 3; in particular, 
V — > \/2a as a — > oo. 

The GN is also solvable when A^ = k + w. This 
shifted linear kernel can be motivated naturally by ex- 
plicitly keeping track of the directionality of the links. 



In particular, the node degree for an undirected graph 
generalizes to the in-degree and out-degree for a directed 
graph. These are just the number of incoming and outgo- 
ing links at a node, respectively. Thus, the node degree fc 
in a directed graph is the sum of the in-degree i and out- 
degree j. The most general linear attachment kernel for a 
directed graph is therefore of the form Aij — ai + bj. The 
GN corresponds to the case where the out-degree of any 
node equals one; thus j = 1 and k — i + Hence the gen- 
eral linear attachment kernel reduces to Ak = a{k — l)+b. 
Since, as mentioned above, the overall scale factor in the 
kernel is irrelevant, we can re- write Ak as the shifted lin- 
ear kernel Ak = k + w, with w — —1 + b/a, so that it can 
vary over the range — 1 < w < oo. 

We can now easily determine the degree distribution 
for the shifted linear attachment kernel. First we note 
that A{t) = J2j^jNj = + wMoit). Then using 

the basic results A = fit, Mq = t and Mi = 2t, we have 
fi = 2 + w and thence i/ = 3 + w, according to Eq. (||). 
Furthermore, from Eq. we easily determine the entire 
degree distribution to be 



Uk = {2 + w) 



T{3 + 2w) T{k + w) 



r{l + w) r(fc-t-3-f2w) 



(12) 



In a similar vein, we can solve the GN with an arbitrary 
piecewise linear attachment kernel. In all these cases, the 
exponent ly can be tuned to any value larger than 2, and 
for sufficiently large degree rik can be expressed as the 
ratio of gamma functions, i. e., the degree distribution is 
a purely (discrete) algebraic function. 



3. Super-hnear kernels 

For the super-linear homogeneous attachment kernels, 
Ak = fc^ with 7 > 1, we now show that a "winner 
take all" phenomenon arises, namely, there emerges a 
single dominant "gel" node which is linked to almost ev- 
ery other node. A particularly singular behavior occurs 
for 7 > 2, where there is a non-zero probability that 
the initial node is connected to every other node of the 
network. 

Let us first determine the probability that the initial 
node connects to all other nodes. It is convenient to con- 
sider a discrete time version of the GN in which one node 
is introduced at each elemental step which always links to 
the initial node. After N steps, the probability that the 
new node will link to the initial node is N'^/{N + iV"). 
This probability that this connectivity pattern continues 
indefinitely is 



V = 



N=l 



1 



(13) 



Clearly, P = when 7 < 2 but 7^ > when 7 > 2. Thus 
for 7 > 2 there is a non-zero probability that the initial 
node connects to all other nodes. 



5 



To determine the behavior for general 7 > 1, we first 
need the asymptotic time dependence of M^. To this 
end, it is useful to consider the discretized version of the 
master equations Eq. (Q), where the time t is limited to 
integer values. Then Nk{t) ~ whenever k > t and the 
rate equation for Nk{k) immediately leads to 



Nk{k) = 



(fc-l)^jVfc_i(fc-l) 
M^{k - 1) 



k-l 



^2(2) n 



J' 



(14) 



From this, and the obvious fact that Nk{k) must be less 
than unity, it follows that M-y (t) cannot grow more slowly 
than f^. On the other hand, M^{t) cannot grow faster 
than V , as follows from the estimate 



M^(t) = ;^fc^7Vfe(t) 
fe=i 

t 

<t'<-'^^kNk{t)^t^-^Mi{t) (15) 
fc=i 

Thus (X V . In fact, the amplitude of V is unity as 
we will derive self-consistently after solving for the A'^^'s. 

We now use ~ with 7 > 1, in the rate equa- 
tions to solve recursively for each Nk- Starting with 
the equation iVi = 1 — Ni/M^, we see that the sec- 
ond term on the right-hand side is sub-dominant. Thus 
by neglecting this term we obtain A^i = t. Similarly, 
= {Ni-2^N2)/M^ - Ni/M^ gives iVa - t^-''/{2-j). 
Continuing this same line of reasoning for each successive 
rate equation gives the leading behavior of iVfc, 



Nk{t) = .ht''-^''-^^^ for fc > 1, 



(16) 



with Jfc = 11^=1 ^ 7)]- This pattern of behav- 

ior for Nk continues as long as its exponent k ~ {k ~ 1)7 
remains positive, or A: < 7/(7— 1). The full temporal be- 
havior of the Nk (t) may be determined straightforwardly 
by keeping the next correction terms in the rate equa- 
tions. For example, Ni{t) = t - t^'^ l{2 - 7) . . .. 

For k > 7/(7— 1), each N^ has a finite limiting value 
in the long-time limit. Since the total number of connec- 
tions equals 2t, and t of them are associated with A^i, 
the remaining t links must all connect to a single node 
which has t connections (up to corrections which grow 
no faster than sub-linearly with time) . Consequently the 
amplitude of equals unity, as argued above. 

Therefore for super-linear kernels, the GN undergoes 
an infinite sequence of connectivity transitions as a func- 
tion of 7. For 7 > 2 all but a finite number of nodes are 
linked to the "gel" node which has the rest of the links of 
the network. This is the "winner take all" situation. For 
3/2 < 7 < 2, the number of nodes with two links grows as 
t'^~"' , while the number of nodes with more than two links 
is again finite. For 4/3 < 7 < 3/2, the number of nodes 



with three links grows as '^'^ and the number with more 
than three is finite. Generally for < 7 < the 
number of nodes with more than m links is finite, while 
Nk ~ ^'=-('=-1)7 fQi- fc < TO. Logarithmic corrections also 
arise at the transition points. 



B. Relation to citation data 

Let us now attempt to relate some of our predictions 
from the GN model to the distribution of citations in 
recent scientific publications ||ll|,|l2| . The GN model rep- 
resents an extreme idealization of the citation process in 
which each publication cites only a single paper and the 
probability of citing a paper depends only on its current 
number of citations, and not on its intrinsic quality or 
any other realistic features. Thus we anticipate that the 
connection between the model and the data will be, at 
best, tenuous. 

The data that we discuss is based on: (a) 783,339 pa- 
pers with 6,716,198 citations (provided by the Institute 
of Scientific Information (ISI)), and (b) 24,296 papers 
with 351,872 citations from all issues of Physical Re- 
view D (PRD) from 1975-1994 (provided by the SPIRES 
database) [|||. A cursory visual inspection of this data 
suggests that the number of publications with k cita- 
tions decays as a stretched exponential function of k (see 
e. (7., Fig. 1 of Ref. [Q). However, an analysis based on 
presenting the data in a Zipf plot, in conjunction with 
scaling, is suggestive of a power-law form for the citation 
distribution, k'" , with 1/ « 3 (Fig. 2 of Ref. This 
ambiguity between a stretched exponential and power- 
law form for the citation distribution corresponds to the 
situation where the predictions of the GN itself are diffi- 
cult to discern numerically. 

If we consider the GN with attachment kernel Ak ^ k'^ 
for 7^1, then a plot of Uk in Eq. (^ versus fc, for 
1 < fc < 1000, changes relatively slowly as 7 varies in 
the range (0.9, 1). If one attempts to fit this data to a 
power law, then an exponent value somewhat larger than 
3 gives a reasonable fit to the data. It is only as 7 — + 1 
from below, however, that the factors in the exponen- 
tial of Eq. (|^) conspire to give a pure power-law form for 
Uk- Because of the relatively small change in Uk as 7 
varies, the relatively incomplete data on the distribution 
of citations is insufficient to provide a clear test for the 
existence of a power law. Further, for the GN model with 
linear attachment kernel, the degree distribution depends 
on additional details of this kernel and can achieve any 
value greater than 2. In short, it is difficult to relate 
the GN model to citation data based on the form of the 
distribution alone. 

Another interesting aspect of the citation distribution 
which can be compared with the GN model is the nature 
of highly-cited publications. Within the GN model, the 
degree of the most popular node, fcmax, may be deter- 



mined by the extreme statistics criterion ^ 



k>k„ 



Nk = 



6 



1, which states that there is one node in the network 
whose degree hes in the range (fcmax,oo)- This criterion 
gives 

( (hrt)i/(i-^) < 7 < 1; 
^max < asymptotically Unear; (17) 

[t 7>1. 

We now compare this prediction with the data about the 
most-cited paper. To make a correspondence between 
citations and Eq. (p^), we identify the total number of 
publications in each dataset with t. The most cited paper 
had 8,904 citations in the ISI data set and 2,026 citations 
in the PRD data set. These results are consistent with 
the first line of Eq. ( |l7|) when 7 w 0.86 and 7 w 0.7 re- 
spectively, and also with the second line for i' « 2.5 and 
v « 2.3 respectively. Thus an analysis of the most-cited 
paper does not cleanly indicate whether the citation dis- 
tribution is a power law or a stretched exponential. 

These ambiguities indicate some of the issues that 
should be be clarified to provide a clear description of 
citations in terms of a growing network model. 



is a power law with exponent = 1 + 1/r, which can be 
tuned to any value larger than 2. This exponent value 
was first obtained in Simon's original paper , but in a 
rather different context, by employing an approach which 
is similar to ours. 



IV. THE AGE DISTRIBUTION 

In addition to the distribution of degree, we study when 
connections occur in the GN. This provides a deeper un- 
derstanding of the overall development of growing net- 
works. Naively, we expect that older nodes will be bet- 
ter connected and this can be quantified by categorizing 
nodes both by their degree and their age. It should be 
emphasized, that the GN does not have explicit aging, 
in which the connection probability depends on the age 
of the target node; this feature is treated in Ref. P6[ . 
Instead, we are merely extending the categorization of 
node to include their age as well as their degree. 

A. Linear connection Icernel 



C. GNR Model 

We now solve the GNR model within the rate equa- 
tion framework. According to the basic processes in the 
model (Fig. |^), the degree distribution Nk{t) evolves by 
the rate equations 



dt Mo 

Mq 



(18) 



For re-direction probability r = 0, the first three terms 
on the right-hand side of Eqs. (|l^) arc the same as in 
the GN. The last two terms account for the change in 
Nk due to the re-direction process. To understand their 
origin, consider the gain term due to re-direction. Since 
the initial node is chosen uniformly, if re-direction does 
occur, the probability that a node with k—1 pre-existing 
links receives the new "re-directed" link is proportional 
to A; — 2, the number of pre-existing incoming links. A 
similar argument applies for the re-direction-driven loss 
term. Since A'o = is tacitly assumed, Eq. (^ applies 
for aU fc > 1. 

By combining the terms in Eq. (^|), the rate equa- 
tion reduces to that of the original GN with Ak — 
[k — l)r + 1 — r — r[k — 1 -|- (1 — r)/r]. By scaling out 
the factor r, we then reduce At to the shifted linear ker- 
nel k + w, with w = (1 — r)/r— 1 ~ 7~2. Thus we 
can merely transcribe our results about the GN with the 
shifted linear kernel to determine the degree distribution 
for the GNR model. Amusingly, for r = 1/2, the GNR 
model is identical to the GN with the purely linear ker- 
nel. In general, the degree distribution in the R model 



Let Ck (t, a) be the average number of nodes of age a 
which have k — 1 incoming links at time t. Here age 
a means that the node was introduced at time t — a. 
That is, we are now resolving each node both by its de- 
gree and its age. The resulting joint age-degree distribu- 
tion is simply related to the degree distribution through 
Nk{t) — Jp dack{t, a). The joint distribution evolves ac- 
cording to 



d_ 

dt 



d_ 

da 



Ck 



A{t) 



SkiSia). (19) 



The second term on the left accounts for the aging of 
nodes and the probability of connecting to a given node 
again depends only on its degree and not on its age. 

We start by considering the linear attachment kernel, 
Ak — k, and focus on the long time asymptotic behav- 
ior. Then we can disregard the initial condition and write 
A{t) = Mi{t) = 2t. This transforms Eqs. (|l|) into 



d d 



(fc - l)cfc_l - fcCfc 

2t 



5ki5{a). (20) 



The homogeneous form of this equation implies that so- 
lution should be self-similar. Thus we seek a solution 
as a function of the single variable a/t rather than two 
separate variables. Thus, we write 



Ck{t,a) = fk{x) 



with X = 1 . 

t 



(21) 



This turns the partial differential equation (20) into the 
ordinary differential equation 



'2x^ = {k- l)/fc-i - kfk- 
ax 



(22) 
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We have omitted the delta function term, since it merely 
provides the boundary condition Cfc(t, a = 0) = Ski, or 



Ml) 



(23) 
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FIG. 4. Age-dependent degree distribution for the GN for 
the linear attachment kernel. Low-degree nodes tend to be 
relatively young while high-degree nodes are old. The inset 
shows detail for a/t > 0.98. 

The solution to this boundary-value problem may be 
simplified by assuming the exponential solution fk — 
this is consistent with the boundary condition, 
provided that <&(!) — 1 and 1^9(1) = 0. The above ansatz 
reduces the infinite set of rate equations (^2|) into two el- 
ementary differential equations for ip{x) and $(a;) whose 
solutions are f{x) = 1 — ^/x and <i>(x) = ^/x. In terms 
of the original variables of a and t, the joint age-degree 
distribution is then 



Ck{t,a) = \ 1 - - <1 - 



t 



fc-i 



(24) 



Thus the degree distribution for nodes of fixed age de- 
cays exponentially with degree, with a characteristic de- 
gree which diverges as (fc) ~ (1 — ajt)^^!'^ for a t. 
As expected, young nodes (those with a/t ^ 0) typically 
have a small degree while old nodes have large degree 
(Fig. ^. It is the slow decay of the degree distribution 
for old nodes which ultimately leads to a power-law de- 
gree distribution when this joint age-degree distribution 
is integrated over all ages to give Nk{t). 



We now solve Eqs. (25), subject to the boundary con- 
dition (p3|), and with /i determined from Eq. (^). Let 
us first replace a; by A" = — Inx, which reduces the 
left-hand side of (gsj) to Applying a Laplace trans- 
form, fk{s) = dX e'"^ fk{X), fk{s) obeys a simple 
algebraic recursion formula whose solution is 



(26) 



Apart from notation, this is identical to Eq. (|3|) and 
can be analyzed accordingly. In particular, we can deter- 
mine /fe(s) for various asymptotically linear attachment 
kernels. For example, for the shifted linear attachment 
kernel, ~ k + w, we find 



A(s) = 



r(i + w + s) r{k + w) 



T{l + w) r{k+i + w + s)' 



(27) 



To invert this Laplace transform, it is useful to rewrite 
this expression as a sum of rational functions /fc(s) 



^l<j<k 
l<3<k 



W + Sj 



This then gives fk{X) 



Ei^^-'..^^'e-^^+"')^: with 



pk 



{-iy-^r{k- 



r{j)T{k-j + l)T{l + w)- 



(28) 



When then re-express this in terms of the original vari- 
able X — e~(2+"')'^. Hence fkix) can be re- written as the 
sum of k power-laws fkix) = Ei<j<fc 
Substituting the explicit expressions (|2^ ) into this sum 
reduces the joint age-degree distribution to 



fkix) 



Tik + w) 1+^ 

X2+^ 



r(/c)r(i-f w) 



1 _ a;2 + „ 



(29) 



This expression shows that old nodes have a broad 
distribution of degrees up to a characteristic degree 
(fc> = (1 - a/t)-i/(2+«.)_ One can also verify that 
the average age ak of nodes of degree fc, defined as 
"fc = ^k^ lo daackit, a) = tn~^ dx (1 - x)fkix), is 



Ofc _^ r(5 3w) T(k + ?, + 2w) 
T ^ ~ r(3 -f 2w) r(fc -I- 5 -t- 3w) 



1 



const. 



(30) 



B. General connection kernels 

Let us now consider the GN with a connection ker- 
nel which grows either linearly or more slowly with k. 
The ansatz (^l]) still is valid, so that the distribution fk 
evolves according to 



Hx — ^ Ak-ifk-i - Akfk- 
dx 



(25) 



Thus nodes with very large degree necessarily have an 
age which approaches that of the entire network. 

Finally, the joint age-degree distribution simplifies in 
the limit k ^ 00 and x ^ 0, with the scaling variable 
^ — fcx^/^^^'"'' kept finite. In this case, we can rewrite 
(p9|) in the scaling form 



fk{x) = k-^FiO, FiO = 



l+w 



r(i + w) 



exp(-e). (31) 
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The scaling variable can also be written as ^ = k/{k), 
and thus Eq. (|3l|) clearly shows that old nodes have a 
broad distribution of degrees: 1 < fc < (fc). 

We can derive explicit age-degree distributions for 
other attachment kernels. For example, for the constant 
attachment kernel, Ak ^ 1, the joint age-degree distribu- 
tion is the Poisson distribution. 



fkiX) = 



X 



fe-1 



-X 



{k-iy. ' 

or in terms of the original variables a and t, 



Ck{t,a) 



ax |ln(l-a/t)|^-i 
tJ (fc-1)! 



(32) 



(33) 



The characteristic degree now diverges relatively slowly, 
viz. (k) — ln(l — a/t) as a — > i, than for asymptotically 
linear attachment kernels. On the other hand, the aver- 
age age approaches the maximal age i at a much faster 
rate, as = t [l — (2/3)*^] , as k approaches its maximal 
value. 

For cases where we have been unable to obtain an ex- 
plicit solution, the Laplace transform method still allows 
us to extract the asymptotics. For example, for asymp- 
totically homogeneous attachment kernels, Ak k'^ as 
k ^ oo, Eq. ( p6|) gives the large-fc asymptotics fk{s) ^ 
A:"'*' exp [— sfc^~'''/(l — 7)] (see Eq. (^). (For concrete- 
ness, we consider here the range 1/2 < 7 < 1.) Inverting 
this Laplace transform yields 



Mx)^k-'S[x - 



(34) 



In particular, the age of nodes with k links is peaked 
about the value which satisfies 



-d-exp 



1^ 



(35) 



This again shows that old nodes are much better con- 
nected. 



V. NODE DEGREE CORRELATIONS 

We now demonstrate that correlations between the de- 
grees of connected nodes spontaneously develop as the 
network grows. One motivation for focusing on these cor- 
relations is that recently random graph models with arbi- 
trary degree distributions have been investigated [§^-§5| . 
While the degree distribution can be chosen arbitrarily 
in these models, the degrees of connected nodes are un- 
correlated. This lack of correlation suggests that such 
random graphs may have limited applicability to grow- 
ing network systems. 

For the GN, a useful characterization of node degree 
correlations is Nki (t) , the number of nodes of total de- 
gree k which attach to an ancestor node of total degree 



For example, in the network of Fig. 0, there arc TVi = 6 
nodes of degree 1, with N12 — Ni^ = Nic, = 2. There 
are also N2 — 2 nodes of degree 2, with iV25 = 2, and 
A'3 = 1 node of degree 3, with = 1. The correlation 
function is not defined for the initial node. Generally, 
Nki is defined for A; > 1 and I > 2, and obeys the sum 
rule Nk = Nki- A gratifying feature of the rate equa- 
tion approach is that the correlation function Nki can be 
understood in a natural and simple fashion. 



A. Linear connection kernel 

For the GN with the linear attachment kernel Ak 
the joint distribution Nki {t) evolves according to 



Ml 



dNk 



dt 



[{k - l)Nk-i,i - kNki] + 

[(/ - l)Nkj-i - INki] + l)Ni_^ Ski. (36) 



The first two terms on the right-hand side account for 
the change in Nki due to the addition of a link onto a 
node of degree k — 1 (gain) or k (loss), while the second 
set of terms gives the change in Nki due to the addition 
of a link onto the ancestor node. Finally, the last term 
accounts for the gain in Nn due to the addition on the 
new node. 

Asymptotically, Mi 2t and Nki ~^ tuki, and we 
use these hypotheses to reduce Eqs. (^ to the time- 
independent recursion relations 

{k + l + 2)nki = {k- l)nk-i,i + {I - l)nk,i-i 

+ {l-l)ni-i6ki. (37) 

This can be reduced to a constant-coefficient inhomoge- 
neous recursion relation by the substitution 



nki 



mm 

r{k + i + 3) 



rriki 



to yield 



mki = ruk-ij + mkj-i + 4(; + 2)6. 



kl- 



(38) 



(39) 



By solving Eqs. ( |39| ) for the first few fc, one can grasp the 
pattern of dependence on k and I and thereby infer the 
general solution 



rriki = 4 



r(fc + /) 



T{k + 2)T{1-1) 



12 



r(fc + ;-i) 
r(fc + i)r(;- 1) 



(40) 



This solution can also be obtained in a more systematic 
manner by the generating function method (see below for 
the shifted linear kernel). Combining Eqs. ( ^) and ( |40| ) 
we finally obtain 



riki 



k{k + l){k + l){k + I + l)(fc + 1 + 2) 
12{1 - 1) 

k{k + l- l)(fc + l){k + I + l)(fc + 1 + 2)' 



(41) 
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The important feature of this result is that the joint 
distribution does not factorize, that is, Uki ^ Ukfii. This 
confirms our earlier assertion that correlations between 
the degrees of connected nodes form spontaneously. This 
is arguably the most important distinction between clas- 
sical random graphs ~ where node degrees are uncorre- 
lated - and the GN. 

While the solution of Eq. (^l|) is unwieldy, it greatly 
simplifies in the scaling regime, fc — > cx) and I — s- oo with 
y = l/k kept finite. The scaled form of the solution is 



Uki = k 



_4 4y(y+_4) 



(42) 



For fixed large fc, the distribution Uki has a single maxi- 
mum at ~ (\/33 — 5)/2 ^ 0.372. Thus a node whose 
degree k is large is typically linked to another node whose 
degree is also large; the typical degree of the ancestor is 
37% of the degree of the daughter node. In the comple- 
mentary case of a fixed degree I for the ancestor node, 
the distribution Uki reaches maximum when k = 1, i. e., 
the daughter node is usually danghng. From Eq. (41), 
we find that this configuration occurs with probability 



nil 



2{l-l){l + 6) 
lil + l)il + 2)il + 3)' 



(43) 



Finally, when both k and I are large and also their ratio 
is very different from one, the limiting behaviors of Uki 
are 



nki 



(l6{l/k^) whenl<s:k, 
\4/(fc2;2) when I ^k. 



(44) 



This last result demonstrates the correlations in the net- 
work most cleanly. If there were no correlations, then 
rikni would be proportional to {kl)~^. 



B. General connection kernels 

In general, correlations between the degrees of neigh- 
boring connected nodes exist for any attachment kernel. 
The analysis of these correlations for an arbitrary kernel 
is tedious and we merely outline some of the primary re- 
sults in the relatively simple cases of the shifted linear 
and constant attachment kernels. 

In the former case, we follow the same approach as the 
linear kernel to reduce the rate equation for the correla- 
tion function to recursion relations of a similar form to 
Eq. (|^), viz. 

{k + l + 2 + 3w)nki = {k + w- l)nfe_i,i + (45) 
{l + w ~ 1) [nk,i-i + nj-i 6ki] ■ 

Here ni is determined from Eq. ([l2|). In analogy with 
Eq. (pq), the substitution 



nici 



Tik + w)T{l + w) 
r{k + l + 3 + 3w) 



rriki 



(46) 



reduces Eqs. ( |45| ) to 

niki = mk-i,i + mk,i-i + OkiW _ ^ ^ ^ , , (47) 



r(r 



2w)' 



where W ^ {2 + w)r(3 + 2w)/{T{l + w))^ . We solve 
the recursion (^) by the generating function method 
pO| . Multiplying Eq. (|4^) by x'^y'' and summing over 
all fc > 1, ^ > 2, wc find that the generating function 



mkix'^y'- 



(48) 



k=l 1=2 



is given by 



M{x,y) 



■E 



r(j 



3w) 



Wxy'^ 

l~x~ y ^ r(j + 4 + 2w) 



(49) 



Expanding AAix^y) we obtain 



l~2 



ruki 



T{k + l~2- i) T{j + 5 + 3w) 

r(fc)r(/-i-j)r(j + 4 + 2u;)' 



(50) 



Eqs. (46) and (pOD constitute the exact solution for the 



correlation function of the GN with the shifted linear at- 
tachment kernel. 

When the parameter w is an integer, we can reduce 
nki to a rational function. In the general case, the exact 
solution also simplifies in several extreme limits. When 
k ^ I, the dominant contribution to Uki is provided by 
the first term in the sum in Eq. ( |50|) . Assuming addition- 
ally I ^ 1 and repeatedly using the asymptotic relation 
T{N + n)/r{N) ^ AT" as iV ^ oo, we ultimately find 



riki ^ W 



r(5 + 3w) 
r(4 + 2w) 



fc>?>l. (51) 



In the complementary case of Z 3> fc S> 1, all the terms 
in the sum of Eq. (pQ) are important. However, we can 
simplify this sum by employing the above asymptotics 
for the ratio of gamma functions and then replacing the 
sum by an easily-computable integral. We find 



Uki ~ M^r(2 



(52) 



When the attachment kernel is uniform, correlations 
between the degrees of a node and its ancestor still de- 
velop. To see how this comes about quantitatively, we 
again follow the same steps as those which led to Eq. (|37| ) 
and find that the joint distribution Uki now satisfies the 
recursion relation 



3nki = rik-i.i + Uk.i-i + 2 



(53) 



This recursion relation can again be solved by the gener- 
ating function technique to give 
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k-1 



n-ki 



21- 



V{1 



-0 ^ 



i)r(i + 1) 3* 



(54) 



To appreciate the qualitative behavior of the joint distri- 
bution riki it is again useful to fix one variable and vary 
the other. For fixed I, Eq. ( [m] ) shows that nui has a 
maximum at fc = 1. The magnitude of this maximum is 
nil = 2^('^^^ — S^^'^^-*. To analyze the behavior when k 
is fixed, it is convenient to transform Eq. into 



1 

3^ 



E 



T{l-l + i) 1 

r(z- i)r(i + 1) 3^ 



(55) 



Now a straightforward analysis shows that for large fc, 
the maximum is attained at I = fc/2. 

The form of the joint distribution nki remains rela- 
tively complex even in the scaling regime, fc, Z — > c», with 
the scaling variable y = l/k kep t finite. We determine the 
scaled form of the solution (|55| ) by applying Stirling's for- 
mula and the identity T{x -\- A)/r(x) 
For 2/ < 2, we find 



X as X 



1 



-kY 



V 



(56) 



where Y = yhiy — [y -\- l)ln[(y -I- l)/3]. For y > 2, it 



is preferable to use the solution in the form of Eq. (54) 
After some algebra, we can verify that the dominant con- 
tribution equals 2~^^~^\ that is, independent of fc |4^ . 

Finally, the limiting behavior of the correlation func- 
tion is 



2-(i-2 



) JL 



(l-2)\ 



when I <^ fc, 
when I 3> fc. 



(57) 



Thus, correlations are strong even for the random at- 
tachment kernel and the qualitative behavior is similar 
to that of the linear attachment kernel. 



VI. LARGE-SCALE PROPERTIES 

The degree of a node is an important but local network 
characteristic and we now seek to quantify more global 
features of the network. One such characteristic is the 
partitioning of the network into an in- component and an 
out- component with respect to any node (Fig. 



in-component 




out-component 



The in-component to node x is the set of all nodes 
from which node x can be reached by following a path 
of directed links. Similarly the out-component of node 
X is the set of nodes which can be reached by follow- 
ing the path directed links which emanate from node x. 
For the GN model, the out-component is just a single 
path, while in more realistic networks both the in- and 
out-components will be branched. In the context of ci- 
tations, the in-component is the set of all publications 
which refer to x, either directly or through intermedi- 
ate reference lists until x is reached. The out-component 
is the set of cited publications generated by iteratively 
following the reference list(s) of x and its ancestors. 



A. In-component size distribution 



The size distribution of the in-component can be eas- 
ily obtained by the rate equation formalism for the GN 
with a uniform attachment kernel and also for the GNR. 
Given the equivalence between the latter and the GN 
with a shifted linear kernel, the latter case is also solu- 
ble. We start by considering the GN with the uniform 
attachment kernel. In this case, the number Is{t) of in- 
components with s nodes satisfies the rate equation 



dh 
dt 



(.S-l)/s 



t 



5si- 



(58) 



FIG. 5. In-component and out-component of node x. which immediately leads to 



To understand this equation, consider first the loss term. 
For an in-component of size s there are s nodes in which 
the attachment of a new node causes this component to 
increase in size by one. This gives a loss rate for 
which is proportional to s. If there is more than one 
in-component of size s they must be disjoint, so that the 
total loss rate for is simply slg. A similar argument 
applies for the gain term. Finally, the overall factor of 
t^^ converts these rates to normalized probabilities. Cu- 
riously, Eqs. (^8|) are almost identical to the rate equa- 
tions for the degree distribution of the GN with linear 
attachment kernel, except that the prefactor equals t~^ 
rather than {2t)~^. 

From Eqs. ( |5^ ) we can determine all moments of the 
in-component size distribution, X„(t) = X]s>i 
The zeroth moment obeys io — 1, whose solution is 
Io{t) = Tq{0) -{- 1. This is obvious since the total number 
of in-components equals the total number of nodes. The 
first moment obeys = l-\-Xi/Xo, whose asymptotic so- 
lution is Ii{t) ^ tint. We shall see that this logarithmic 
factor is an outcome of the asymptotic power law for 1^ 
with the tail decaying as s~^. 

To solve for Is{t), we note that it again grows linearly 
in time. Thus we substitute the ansatz /<;(<) = ti^ into 
(58) to obtain zi = 1/2 and is — is-i{s — l)/(s -I- 1), 



Eqs. 
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1 



s(s + l)' 



(59) 



For this decay, the moments X„ diverge when n > 1. 
However, the size of the largest in-component, Smax = t, 
provides an upper threshold in the computation of the 
moments. For example, Xi ~ Ss<t'5^s(i) = tint. It 
is intriguing that the algebraic in-component distribu- 
tion co-exists with an exponential in-degree distribution, 
nfe = 2-^ 

Similarly, we can determine Is{t) for the GNR model. 
In this case, the number Is{t) of in-componcnts with s 
nodes satisfies 



dh ^ (g - 2 + (1 - r))/,_i - (.s - 1 + (1 - r))Is 
dt 



t 



(60) 



for s > 2, and /i = 1 — (1 — r)Ii/t. This rate equation 
can be understood in a similar manner as Eq. (|5^). Con- 
sider the loss term for an in-component of size s. There 
are two possibilities to consider: (i) If the apex of the in- 
component is initially chosen, then the new node will at- 
tach to this apex with probability 1 — r {i. e., attach with 
no re-direction); (ii) If any other of the s — 1 nodes of the 
in-component is chosen, the new node will surely attach 
to the in-component even if re-direction occurs. These 
two processes give a loss rate for Ig which is proportional 
to (s — 1 -|- (1 — r))Is. Solving for the in-component dis- 
tribution in this process now yields Is{t) — tis, with 



1 - r 



(s — r)(s + 1 — r) 



(61) 



Remarkably, the asymptotic power law Is oc holds 
for any r. It is striking that this apparently universal be- 
havior has also recently been observed in measurements 
of the Internet 

Since the GNR model is identical to the GN with 
the shifted linear attachment kernel = A: -I- — 2), 
Eq. (^) also applies to the in-component distribution for 
the GN with shifted linear attachment kernels. For ex- 
ample, the in-component distribution for the linear ker- 
nel is is — 2/(4s^ — 1). Since the same Is oc decay 
holds for the GN with both constant and linear attach- 
ment kernels, we conjecture that the in-component dis- 
tribution exhibits a universal decay for an arbitrary 
attachment kernel, as long as it does not grow faster than 
linearly with node degree. 



B. Out-component size distribution 

The out-component from each node reveals basic in- 
sights about the "genealogy" of the growing network in 
an extremely simple fashion. For example, it allows us 
to estimate the diameter of the network, an important 
characteristic which has been measured for the web graph 
MM and for social networks 121. 



For this characterization, we begin by reorganizing the 
GN into a genealogical tree according to a procedure 
which is suggested by the growth process itself. Gener- 
ation g ~ contains the single "seed" node. The nodes 
which attach to the seed node form generation g = 1, and 
generally the nodes which attach to nodes in generation 
g form generation g + 1, independent of when the attach- 
ment actually occurs. Thus the position of a node in the 
genealogical tree depends only on the position of the an- 
cestor node and not on when the node is introduced. In 
this respect, the GN genealogical tree differs from usual 
genealogies, where each new generation is born into a 
progressively later position in the genealogical tree. For 
example, the network of Fig. 1 has 5 nodes in the first 
generation and 4 nodes in the second generation leading 
to the genealogical tree of Fig. ^. The sizes of all gen- 
erations grow continuously, except for generation 5 = 
which always consists of the single node. 



g=0 




FIG. 6. Genealogy of the growing random network of 
Fig. 0. The indices indicate when a node is introduced, while 
the ancestor determines where a new node is positioned. 

Once we understand the genealogical structure of the 
GN, we simultaneously establish the out-component dis- 
tribution. Indeed, the number Os of out-components 
with s nodes equals Ls-i, the number of nodes in genera- 
tion s — 1 in the genealogical tree. We therefore compute 
Lg{t), the size of generation g at time t. We start with the 
simplest situation when the attachment rate is uniform. 
In this case, Lg{t) increases when a new node attaches 
to a node in the previous generation. This occurs with 
rate Lg^i/Mo, where Mo(t) = 1 + f is the total number 
of nodes. Because of the simplicity of the corresponding 
rate equations, we use the exact expression for Mq rather 
than the asymptotic expression Mq ^ t, as was done in 
solving for the in-component. Thus we write 



dLg _ Lg-i 
~dr ~ 1 + t' 

Solving these equations gives 



(62) 



Lg{T) = — where r = ln(l + t). (63) 

We therefore conclude that for a fixed (large) time, the 
generation size grows with g when g < t, reaches a max- 
imum size which is equal to 



t 



V27rlnt 



(64) 
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when g = T, and then decreases and eventually becomes 
of order one when g = er. The distribution Lg quickly 
decays when g exceeds the cutoff value er. At time t, 
the genealogical tree therefore contains approximately er 
generations. Hence the diameter D of the network is ap- 
proximately 2er, or 



D w 2elniV 



(65) 



where N = 1 + t is the total number of nodes. Thus, the 
diameter of an evolving GN exhibits the same N depen- 
dence as a static random graph [Q. 

We can also find the generation size distribution for 
shifted linear attachment kernels. It is again simpler to 
derive the rate equations in the framework of the GNR 
model and then transcribe the results to the shifted lin- 
ear kernel. For the GNR model, the rate equation for the 
generation size distribution is 



dig 

dt 



(1 - r)Lg_i -I- rLg 
1 + t 



(66) 



for o> 1, and Li = (1 + t)-'^[l + rLi]. The first term 
in ( pq) has the same origin as in the GN without re- 
direction and the second term accounts for the change in 
Lg due to the re-direction. In the latter case, the new 
node provisionally attaches to a node in generation g; 
this occurs with relative probability Lg. However, by the 
re-direction process, this new node actually attaches to a 
node in generation g — 1 and thereby joins generation g. 

To solve Eq. (66), we again use r = ln(l + t) and apply 
the Laplace transform technique. After some elementary 
steps, we obtain 



"d.Ki^e-. 



(67) 



From this solution, we find that for a fixed (large) time, 
the generation size grows with g when g < (I — r)T, 
reaches a maximum value Lmax — t/\/27r(l ^ r)h\t at 
g = {\ — r)T, and then decreases when 17 > (1 — r)r. 
Eventually the generation size becomes of order one when 
g = Gt, where G is the root of equation Gln(G'/(l — r)) = 
G + r. The diameter of the network is then D « 2G'r. 

These two solvable cases again suggest that the geneal- 
ogy of the GN is robust, as long as the attachment kernel 
does not grow faster than linearly with node degree. For 
the super-linear kernels, however, the genealogy changes 
drastically. When the attachment exponent exceeds 2, 
there will be only a few generations overall, and one gen- 
eration g* will contain all but the finite number of nodes. 
For such a network, the gel node will reside in generation 
g* — 1. When the attachment exponent lies in the range 
1 < 7 < 2, a single generation will also contain almost 
all t nodes. However, the number of nodes which reside 
in other generations is of order t'^~^ and thus grows as 
well. Additionally, the number of non-empty generations 
grows indefinitely with the total number of nodes. 



The above results can be reformulated in terms of the 
out-component distribution. In particular, for the GN 
with uniform attachment kernel, the number Og of out- 
components with s nodes equals 



Os{t) 



where r — ln(l + t). 



(68) 



Similar results apply for the linear attachment kernel, 
suggesting that the out-component distribution is robust 
as long as the attachment kernel does not grow faster 
than linearly with node degree. 



VII. DISCUSSION AND CONCLUSIONS 

In this paper, we have analyzed the structure of the 
growing network (GN) model and shown that many of its 
properties can be easily determined within a rate equa- 
tion approach. We have found that the GN has a power- 
law node degree distribution, Ni^{t) ^ tk~'^, for asymp- 
totically linear attachment kernels, with an exponent j/ 
which is always larger than 2. By tuning parameters of 
the model in a reasonable way, it is easy to obtain a node 
degree distribution which is in quantitative agreement 
with available data for the web graph ||l^- |l6| , |l9| - ^l]j49|] . 

A remarkable feature of this network is the sponta- 
neous development of correlations between connected 
nodes. These correlations provide a much more sensi- 
tive characterization of the structure of growing networks 
than the extensively studied degree distribution. These 
correlations are a crucial feature which distinguishes the 
GN from classical random graphs. Thus testing for the 
presence of correlations between node degrees in large 
evolving networks may provide crucial insights to help 
determine the underlying mechanism of their growth. 

We have also studied two specific large-scale properties 
of the network, namely, the size distributions of the in- 
and out-components with respect to a given site. The 
in-component distribution exhibits a robust power- 
law behavior, where s is the component size, as long as 
the attachment probability does not grow faster than lin- 
early with node degree. The out-component distribution 
reveals the basic genealogical feature that the number of 
"generations" in the network grows logarithmically with 
the total number of nodes, again for attachment kernels 
which do not grow faster than linearly in node degree. 

The qualitative agreement between the degree distri- 
butions of real evolving networks, such as the web graph, 
and the GN is reassuring given that the model ignores 
many important features of real networks. Nevertheless, 
a number of characteristics of real growing networks are 
difficult to treat in the framework of the GN model. One 
important such characteristic is the out-degree distribu- 
tion. Within the GN model the out-degree of each node 
is one by construction. In contrast, for real growing net- 
works the out-degree distribution has a power law form 
P9[. Additionally, the average in- and out-degrees at 
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each node are generally larger than one. For the web 
graph, for example, {i) = (j) « 7.5 ji^ . 

There are several natural ways to extend the GN model 
to generate an average out-degree which is greater than 
one. A simple construction is to link every new node 
to more than one earlier node, as already discussed in 
Ref . . Let us consider a network which is built by at- 
taching every new node to exactly p earlier nodes. For the 
linear attachment kernel, the degree distribution Nk{t), 
which is now defined only for k> p, evolves according to 



(69) 



Clearly, the average in-degree (i) and out-degree (j) of 
each node in this network is equal to p. By applying 
the basic approach of Sec. Ill to this rate equation, we 
find that the degree distribution again asymptotically ap- 
proaches a stable distribution Nk — > trik, with 



2p(P+l) 
fc(fc-M)(fc + 2) 



for k > p. 



(70) 



Thus for the linear attachment kernel, the average node 
degree does not affect the exponent i' of the degree dis- 
tribution. However, for other solvable examples, the new 
feature of attaching the new node to more than one pre- 
existing node leads to different degree distributions. For 
example, for the shifted linear kernel we find 



Uk = const. X 



r(fc- 



W) 



rip = I 1 



P 



2p 



3 + w 

-1 



w/p) 



for k> p, 



(71) 
(72) 



This gives the asymptotic behavior nu ~ k^i'i+^/p)_ 
Thus the exponent of the degree distribution depends on 
the average node degree, with j/ = 3 -I- w/p. 

The multiple linking construction also reduces the 
number of nodes with in-degree zero. For example, for 
the GN with the shifted linear attachment kernel, the 
fraction of such nodes is ni = (2 + w)/(3 + 2w), which is 
always larger than 1/2. However, for the multiple linking 
construction, the fraction of nodes with in-degree zero is 
reduced to the value Up given in Eq. ([T^. If we use p = 7 
to reproduce the correct average node degree of the web 
graph, the fraction of nodes with in-degree zero always 
exceeds 1/8, which, however, apparently disagrees with 
web data Thus while multiple attachment does re- 
duce the number of poorly-connected nodes, this reduc- 
tion is still insufficient to account for web-graph data. 
However, it is clear that the multiple linking construc- 
tion has the potential to provide a better description of 
citation data. 

Another shortcoming of the multiple attachment con- 
struction is that it cannot dynamically generate a non- 
trivial out-degree distribution. However, we can extend 
the GN model by allowing for creation of links between 



existing nodes jS^l- This simple construction allows us 
to generate non-trivial out-degree distributions which 
closely match web graph data. An even more challenging 
direction is to describe the global topological structure 
of growing networks. The GN model leads to a single- 
component tree graph, while the web graph has numerous 
disconnected components. A deeper understanding of the 
web graph may provide valuable insights to help develop 
algorithms for web crawling, searching, and community 
discovery. 
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